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Crudely, r could be considered to be the maximum 
change in success probability that one would expect 
given that ESP exists. Also, these distributions are 
the "extreme points" over the class of symmetric 
unimodal conditional densities, so answers that hold 
over this class are also representative of answers 
over a much larger class. Note that here r < 0.25 
(because 0 < 0 < 1); for the given data the 0 > 0.5 
are essentially irrelevant, but if it were deemed 
important to take them into account one could use 
the more sophisticated binomial analysis in Berger 
and Delampady (1987). 

For g r , the Bayes factor of H 1 to H 0 , which is to 
be interpreted as the relative odds for the hypothe- 
ses provided by the data, is given by 



B(r) = 
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This is graphed in Figure 1. 

The P- value for this problem was 0.00005, indi- 
cating overwhelming evidence against H 0 from a 
classical perspective. In contrast to the situation 
studied by Jefferys (1990), the Bayes factor here 
does not completely reverse the conclusion, show- 
ing that there are very reasonable values of r for 
which the evidence against H 0 is moderately 
strong, for example 100/1 or 200/1. Of course, this 
evidence is probably not of sufficient strength to 
overcome strong prior opinions against H 0 (one 




Fig. 1. The Bayes factor of Hi to H 0 as a function of r, the 
maximum change in success probability that is expected given 
that ESP exists, for the ganzfeld experiment. 



obtains final posterior odds by multiplying prior 
odds by the Bayes factor). To properly assess 
strength of evidence, we feel that such Bayes factor 
computations should becomie standard in parapsy- 
chology. 

As mentioned by Professor Utts, Bayesian meth- 
ods have additional potential in situations such as 
this, by allowing unrealistic models of iid trials to 
be replaced by hierarchical models reflecting differ- 
ing abilities among subjects. 
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Comment 

Ree Dawson 



This paper offers readers interested in statistical 
science multiple views of the controversial history 
of parapsychology and how statistics has con- 
tributed to its development. It first provides an 
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account of how both design and inferential aspects 
of statistics have been pivotal issues in evaluating 
the outcomes of experiments that study psi abili- 
ties. It then emphasizes how the idea of science as 
replication has been key in this field in which 
results have not been conclusive or consistent and 
thus meta-analysis has been at the heart of the 
literature in parapsychology. The author not only 
reviews past debate on how to interpret repeated 
psi studies, but also provides very detailed informa- 
tion on the Honorton-Hyman argument, a nice 
illustration of the challenges of resolving such de- 
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bate. This debate is also a good example of how 
statistical criticism can be part of the scientific 
process and lead to better experiments and, in gen- 
eral, better science. 

The remainder of the paper addresses technical 
issues of meta-analysis, drawing upon recent re- 
search in parapsychology for an in-depth applica- 
tion. Through a series of examples, the author 
presents a convincing argument that power issues 
cannot be overlooked in successive replications and 
that comparison of effect sizes provides a richer 
alternative to the dichotomous measure inherent in 
the use of p- values. This is particularly relevant 
when the potential effect size is small and re- 
sources are limited, as seems to be the case for psi 
studies. 

The concluding section briefly mentions Bayesian 
techniques. As noted by the author, Bayes (or em- 
pirical Bayes) methodology seems to make sense for 
research in parapsychology. This discussion exam- 
ines possible Bayesian approaches to meta-analysis 
in this field. 

BAYES MODELS FOR PARAPSYCHOLOGY 

The notion of repeatability maps well into the 
Bayesian set-up in which experiments, viewed as a 
random sample from some superpopulation of ex- 
periments, are assumed to be exchangeable. When 
subjects can also be viewed as an approximately 
random sample from some population, it is appro- 
priate to pool them across experiments. Otherwise, 
analyses that partially pool information according 
to experimental heterogeneity need to be consid- 
ered. Empirical and hierarchical Bayes methods 
offer a flexible modeling framework for such analy- 
ses, relying on empirical or subjective sources to 
determine the degree of pooling. These richer meth- 
ods can be particularly useful to meta-analysis of 
experiments in parapsychology conducted under 
potentially diverse conditions. 

For the recent ganzfeld series, assuming them 
to be independent binomially distributed as dis- 
cussed in Section 5, the data can be summed 
(pooled) across series to estimate a common hit 
rate. Honorton et al. (1990) assessed the homogene- 
ity of effects across the 11 series using a chi-square 
test that compares individual effect sizes to 
the weighted mean effect. The chi-square statistic 
Xio = 16.25, not statistically significant (p = 
0.093), largely reflects the contribution of the last 
"special" series (contributes 9.2 units to the Xio 
value), and to a lesser extent the novice series with 
a negative effect (contributes 2.5 units). The outlier 
series can be dropped from the analysis to provide a 
more conservative estimate of the presence of psi 



effects for this data (this result is reported in Sec- 
tion 5). For the remaining 10 series, the chi-square 
value xf = 7.01 strongly favors homogeneity, al- 
though more than one-third of its value is due to 
the novice series (number 4 in Table 1). This pat- 
tern points to the potential usefulness of a richer 
model to accommodate series that may be distinct 
from the others. For the earlier ganzfeld data ana- 
lyzed by Honorton (1985b), the appeal of a Bayes or 
other model that recognizes the heterogeneity 
across studies is clear cut: X23 = 56.6, p = 0.0001, 
where only those studies with common chance hit 
rate have been included (see Table 2). 

Historic reliance on voting-count approaches to 
determine the presence of psi effects makes it natu- 
ral to consider Bayes models that focus on the 
ensemble of experimental effects from parapsycho- 
logical studies, rather than individual estimates. 
Recent work in parapsychology that compares ef- 
fect sizes across studies, rather than estimating 
separate study effects, reinforces the need to exam- 
ine this type of model. Louis (1984) develops Bayes 
and empirical Bayes methods for problems that 
consider the ensemble of parameter values to be 
the primary goal, for example, multiple compar- 
isons. For the simple compound normal model, 
Y t ~ N(6 i9 1), d t ~ N(ix, t 2 ), the standard Bayes 
estimates (posterior means) 

T 2 

0* = M + D(Y i -») and D = — -j 

1 T T 

where the 6 t represent experimental effects of in- 
terest, are modified approximately to 

when an ensemble loss function is assumed. The 
new estimates adjust the shrinkage factor D so 
that their sample mean and variance match the 
posterior expectation and variance of the 0's. Simi- 
lar results are obtained when the model is gener- 



Table 1 
Recent ganzfeld series 



Series type 


N Trials 


Hit rate 






Pilot 


22 


0.36 


-0.58 


0.44 


Pilot 


9 


0.33 


-0.71 


0.71 


Pilot 


36 


0.28 


-0.94 


0.37 


Novice 


50 


0.24 


-1.15 


0.33 


Novice 


50 


0.36 


-0.58 


0.30 


Novice 


50 


0.30 


-0.85 


0.31 


Novice 


50 


0.36 


-0.58 


0.30 


Novice 


6 


0.67 


0.71 


0.87 


Experienced 


7 


0.43 


-0.28 


0.76 


Experienced 


50 


0.30 


-0.85 


0.31 


Experienced 


25 


0.64 


0.58 


0.42 


Overall 


355 


0.34 
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Table 2 
Earlier ganzfeld studies 



N Trials 


Hit rafp 

xxit x ate 


y. 




32 


0.44 


-0.24 


0.36 


7 


0.86 


1.82 


1.09 


30 


0.43 


-0.28 


0.37 


30 


0.23 


-1.21 


0.43 


20 


0.10 


-2.20 


0.75 


10 


0.90 


2.20 


1.05 


10 


0.40 


-0.41 


0.65 


28 


0.29 


-0.90 


0.42 


10 


0.40 


-0.41 


0.65 


20 


0.35 


-0.62 


0.47 


26 


0.31 


-0.80 


0.42 


20 


0.45 


-0.20 


0.45 


20 


0.45 


-0.20 


0.45 


30 


0.53 


0.12 


0.37 


36 


0.33 


-0.71 


0.35 


32 


0.28 


-0.94 


0.39 


40 


0.28 


-0.94 


0.35 


26 


0.46 


-0.16 


0.39 


20 


0.60 


0.41 


0.46 


100 


0.41 


-0.36 


0.20 


40 


0.33 


-0.71 


0.34 


27 


0.41 


-0.36 


0.39 


60 


0.45 


-0.20 


0.26 


48 


0.21 


-1.33 


0.35 


722 


.38 







alized to the case of unequal variances, Y t ~ 

For the above model, the fraction of Of above (or 
below) a cut point C is a consistent estimate of the 
fraction of 0 t > C (or 0. < C). Thus, the use of 
ensemble, rather than component-wise, loss can 
help detect when individual effects are above 
a specified threshold by chance. For the meta- 
analysis of ganzfeld experiments, the observed bi- 
nomial proportions transformed on the logit (or 
arcsinV) scale can be modeled in this framework. 
Letting d t and m £ denote the number of direct hits 
and misses respectively for the ith experiment, and 
p t as the corresponding population proportion of 
direct hits, the Y t are the observed logits 

Y^logidjm,) 

and , of, estimated by maximum likelihood as 
1/di + l/m i9 is the variance of Y t conditional on 
0 { = logit(/?.). The threshold logit (0.25) * 1.10 can 
be used to identify the number of experiments for 
which the proportion of direct hits exceeds that 
expected by chance. 

Table 1 shows Y t and a t for the 11 ganzfeld 
series. All but one of the series are well above the 
threshold; Y 4 marginally falls below -1.10. Any 
shrinkage toward a common hit rate will lead to an 
estimate, 0* or 0[, above the threshold. The use of 
ensemble loss (with its consistency property) pro- 



vides more convincing support that all 0 t > - 1.10, 
although posterior estimates of uncertainty are 
needed to fully calibrate this. For the earlier 
ganzfeld data in Table 2, ensemble loss can simi- 
larly be used to determine the number of studies 
with 0 t < - 1.10 and specifically whether the nega- 
tive effects of studies 4 and 24 (Y 4 = -1.21 
and Y 24 = -1.33) occurred as a result of chance 
fluctuation. 

Features of the ganzfeld data in Section 5, such 
as the outlier series, suggest that further elabora- 
tion of the basic Bayesian set-up may be necessary 
for some meta-analyses in parapsychology. Hierar- 
chical models provide a natural framework to spec- 
ify these elaborations and explore how results 
change with the prior specification. This type of 
sensitivity analysis can expose whether conclusions 
are closely tied to prior beliefs, as observed by 
Jeffreys for RNG data (see Section 7). Quantifying 
the influence of model components deemed to be 
more subjective or less certain is important to broad 
acceptance of results as evidence of psi performance 
(or lack thereof). 

Consider the initial model commonly used for 
Bayesian analysis of discrete data: 

Y i \p u n i ~B(p i ,n i ), 

Ot-N^T 2 ), 0;=lOgit(/7;)> 

with noninformative priors assumed for fi and 7 2 
(e.g., log t locally uniform). The distinctiveness of 
the last "special" series and, in general, the differ- 
ent types of series (pilot versus formal, novice ver- 
sus experienced) raises the question of whether the 
experimental effects follow a normal distribution. 
Weighted normal plots (Ryan and Dempster, 1984) 
can be used to graphically diagnose the adequacy of 
second-stage normality (see Dempster, Selwyn and 
Weeks, 1983, for examples with binary response 
and normal superpopulation). 

Alternatively, if nonnormality is suspected, the 
model can be revised to include some sort of heavy- 
tailed prior to accommodate possibly outlying se- 
ries or studies. West (1985) incorporates additional 
scale parameters, one for each component of the 
model (experiment), that flexibly adapt to a typi- 
cal 0 t and discount their influence on posterior 
estimates, thus avoiding under- or over-shrinkage 
due to such 0 t . For example, the second stage 
can specify the prior as a scale mixture of normals: 

0 ; ~iV( M ,T 2 7 ri), 

-2 2 

This approach for the prior is similar to others for 
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maximum likelihood estimation that modify the 
sampling error distribution to yield estimates that 
are "robust" against outlying observations. 

Like its maximum likelihood counterparts, in ad- 
dition to the robust effect estimates 6 *, the Bayes 
model provides (posterior) scale estimates 7 *. These 
can be interpreted as the weight given to the data 
for each 6 t in the analysis and are useful to diag- 
nosing which model components (series or studies) 
are unusual and how they influence the shrinkage. 
When more complex groupings among the d t are 
suspected, for example, bimodal distribution of 
studies from different sites or experimenters, other 
mixture specifications can be used to further relax 
the shrinkage toward a common value. 

For the 11 ganzfeld series, the last "outlier" 
series, quite distinct from the others (hit rate = 
0.64), is moderately precise (N = 25). Omitting it 
from the analysis causes the overall hit rate to drop 
from 0.344 to 0.321. The scale mixture model is a 
compromise between these two values (on the logit 
scale), discounting the influence of series 11 on the 
estimated posterior common hit rate used for 
shrinkage. The scale factor y* l9 an indication of 
how separate 0 n is from the other parameters, also 
causes 0* x to be shrunk less toward the common hit 
rate than other, more homogeneous 0 i9 giving more 
weight to individual information for that series (see 
West, 1985). The heterogeneity of the earlier 
ganzfeld data is more pronounced, and studies are 
taken from a variety of sources over time. For these 
data, the 7* can be used to explore atypical studies 
(e.g., study 6, with hit rate = 0.90, contributes more 
than 25% to the xfa- value for homogeneity) and 
groupings among effects, as well as protect the 
analysis from misspecification of second-stage 
normality. 

Variation among ganzfeld series or studies and 
the degree to which pooling or shrinking is appro- 
priate can be investigated further by considering a 
range of priors for t 2 . If the marginal likelihood of 
7 2 dominates the prior specification, then results 
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should not vary as the prior for t 2 is varied. Other- 
wise, it is important to identify the degree to which 
subjective information about interexperimental 
variability influences the conclusions. This sen- 
sitivity analysis is a Bayesian enrichment of 
the simpler test of homogeneity directed toward 
determining whether or not complete pooling is 
appropriate. 

To assess how well heterogeneity among his- 
torical control groups is determined by the data. 
Dempster, Selwyn and Weeks (1983) propose three 
priors for 7 2 in the logistic-normal model. The prior 
distributions range from strongly favoring individ- 
ual estimates, p(T 2 )dr oc 7" 1 , to the uniform refer- 
ence prior p{r 2 )dr oc 7" 2 , flat on the log 7 scale, to 
strongly favoring complete pooling, p{r 2 )dr oc 7" 3 
(the latter forcing complete pooling for the com- 
pound normal model; see Morris, 1983). For their 
two examples, the results (estimates of linear treat- 
ment effects) are largely insensitive to variation in 
the prior distribution, but the number of studies in 
each example was large (70 and 19 studies avail- 
able for pooling). For the 11 ganzfeld series, 7 2 may 
be less well determined by the data. The posterior 
estimate of 7 2 and its sensitivity to p(T 2 )dr will 
also depend on whether individual scale parame- 
ters are incorporated into the model. Discounting 
the influence of the last series will both shift the 
marginal likelihood toward smaller values of 7 2 
and concentrate it more in that region. 

The issue of objective assessment of experiment 
results is one that extends well beyond the field of 
parapsychology, and this paper provides insight into 
issues surrounding the analysis and interpretation 
of small effects from related studies. Bayes meth- 
ods can contribute to such meta-analyses in two 
ways. They permit experimental and subjective evi- 
dence to be formally combined to determine the 
presence or absence of effects that are not clear cut 
or controversial (e.g., psi abilities). They can also 
help uncover sources and degree of uncertainty in 
the scientific conclusions. 
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